Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SharkInference-SharkRuntime] Adds capability to mmap vmfbs #1540

Merged
merged 1 commit into from
Jun 22, 2023

Conversation

Abhishek-Varma
Copy link
Contributor

-- This commit is based on VmModule.mmap() API.
-- It thereby adds capability to mmap vmfbs in SHARK.

Signed-off-by: Abhishek Varma [email protected]

@Abhishek-Varma
Copy link
Contributor Author

So, tempfile.NamedTemporaryFile needs to have delete=False set for us to get this to work on Windows.

Also, on Windows I see an API related issue in VmModule.mmap at this line. The error being :-

AttributeError: module 'mmap' has no attribute 'MAP_SHARED'

@powderluv
Copy link
Contributor

This has landed again. iree-org/iree#14153

mmaped_vmfb = ireert.VmModule.mmap(instance, flatbuffer_blob_or_path)
context = ireert.VmContext(instance, modules=[hal_module, mmaped_vmfb])
else:
tmpf = tempfile.NamedTemporaryFile(delete=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be automatic now.

@powderluv
Copy link
Contributor

Lets land this tomorrow so we can have mmap support

@Abhishek-Varma
Copy link
Contributor Author

Oh! I missed this - sure, I'll take a look at the upstream changes - accordingly make changes to this PR, test and mark this ready for review.

@Abhishek-Varma
Copy link
Contributor Author

So, I'm trying to see how to unlink the mapped file.
Because the callback leads to an error because by the time it is triggered the temporary file's name/data is lost.
Will test on Windows too post that - I see my setup was deleted.

Worst case - I'll keep the mmap loading confined only to those cases where we indeed have the vmfb generated and saved. Therefore getting rid of the temporary file's need.

Have made changes to from_flatbuffer API to use from_buffer API + the warning being set off.

Will update here on my progress.

@powderluv

@Abhishek-Varma
Copy link
Contributor Author

I've verified the loading on CUDA initially.
I shifted to CPU (because CUDA VM got preempted) and worked on it - I've kept the unlinking limited only to temporary files but based on a few comments upstream I've commented the lines. Apart from the test script which I was using - I also verified if StableDiffusion's vmfb gets loaded properly or not.
On CPU I also verified that the execution takes the path of load_module where I'm not performing unlinking else the generated vmfb itself would get deleted!

On Windows I tested with my script - the script would take care of compiling a basic Module and explore the different paths Shark's compilation can take (even switching mmap ON/OFF).

After a while I switched over to CUDA and tried re-installing shark.venv using setup_venv.sh in my branch.
It gave the following error :-

Looking in links: https://llvm.github.io/torch-mlir/package-index/, https://nod-ai.github.io/SHARK-Runtime/pip-release-links.html, https://download.pytorch.org/whl/nightly/torch/
Obtaining file:///home/abhishek/my-shark
  Installing build dependencies ... error
  error: subprocess-exited-with-error
  
  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: 1
  ╰─> [39 lines of output]
      Looking in links: https://llvm.github.io/torch-mlir/package-index/, https://nod-ai.github.io/SHARK-Runtime/pip-release-links.html, https://download.pytorch.org/whl/nightly/torch/
      Collecting setuptools>=42
        Using cached setuptools-68.0.0-py3-none-any.whl (804 kB)
      Collecting wheel
        Using cached wheel-0.40.0-py3-none-any.whl (64 kB)
      Collecting packaging
        Using cached packaging-23.1-py3-none-any.whl (48 kB)
      Collecting numpy>=1.22.4
        Using cached numpy-1.25.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.6 MB)
      Collecting torch-mlir>=20221021.633
        Using cached torch_mlir-20221213.686-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (221.7 MB)
      Collecting iree-compiler>=20221022.190
        Using cached iree_compiler-20230524.529-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (55.8 MB)
      Collecting iree-runtime>=20221022.190
        Using cached iree_runtime-20230524.529-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB)
      INFO: pip is looking at multiple versions of torch-mlir to determine which version is compatible with other requirements. This could take a while.
      Collecting torch-mlir>=20221021.633
        Using cached torch_mlir-20221212.685-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (221.7 MB)
        Using cached torch_mlir-20221211.684-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (221.6 MB)
        Using cached torch_mlir-20221210.683-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (221.6 MB)
        Using cached torch_mlir-20221209.682-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (221.6 MB)
        Using cached torch_mlir-20221208.681-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (221.6 MB)
        Using cached torch_mlir-20221206.71-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (219.6 MB)
      ERROR: Cannot install torch-mlir==20221206.71, torch-mlir==20221208.681, torch-mlir==20221209.682, torch-mlir==20221210.683, torch-mlir==20221211.684, torch-mlir==20221212.685 and torch-mlir==20221213.686 because these package versions have conflicting dependencies.
      
      The conflict is caused by:
          torch-mlir 20221213.686 depends on torch==2.0.0.dev20221211
          torch-mlir 20221212.685 depends on torch==2.0.0.dev20221211
          torch-mlir 20221211.684 depends on torch==1.14.0.dev20221205
          torch-mlir 20221210.683 depends on torch==1.14.0.dev20221205
          torch-mlir 20221209.682 depends on torch==1.14.0.dev20221205
          torch-mlir 20221208.681 depends on torch==1.14.0.dev20221205
          torch-mlir 20221206.71 depends on torch==1.14.0.dev20221122
      
      To fix this you could try to:
      1. loosen the range of package versions you've specified
      2. remove package versions to allow pip attempt to solve the dependency conflict
      
      ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

Even the iree-compiler's version is quite old 20230524.529 instead of 20230620.434 when I checked with pip list on my Linux CUDA VM.

I speculate that's why CI is failing as well. @monorimet

@powderluv

@powderluv
Copy link
Contributor

Your in python 3.10. using 3.11 will fix it

-- This commit is based on [VmModule.mmap() API](iree-org/iree#14124).
-- It thereby adds capability to mmap vmfbs in SHARK.

Signed-off-by: Abhishek Varma <[email protected]>
Copy link
Collaborator

@monorimet monorimet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for this.

@Abhishek-Varma Abhishek-Varma merged commit cdd505e into nod-ai:main Jun 22, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants